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SUMMARY 


Drawing causal inference with observational studies is the central pillar of many disciplines. 
One sufficient condition for identifying the causal effect is that the treatment-outcome relation- 
ship is unconfounded conditional on the observed covariates. It is often believed that the more 
covariates we condition on, the more plausible this unconfoundedness assumption is. This be- 
lief has had a huge impact on practical causal inference, suggesting that we should adjust for all 
pretreatment covariates. However, when there is unmeasured confounding between the treatment 
and outcome, estimators adjusting for some pretreatment covariate might have greater bias than 
estimators without adjusting for this covariate. This kind of covariate is called a bias amplifier, 
and includes instrumental variables that are independent of the confounder, and affect the out- 
come only through the treatment. Previously, theoretical results for this phenomenon have been 
established only for linear models. We fill in this gap in the literature by providing a general 
theory, showing that this phenomenon happens under a wide class of models satisfying certain 
monotonicity assumptions. We further show that when the treatment follows an additive or multi- 
plicative model conditional on the instrumental variable and the confounder, these monotonicity 
assumptions can be interpreted as the signs of the arrows of the causal diagrams. 


Some key words: Causal inference; Directed acyclic graph; Interaction; Monotonicity; Potential outcome 


1. INTRODUCTION 


Causal inference from observational data is an important but challenging problem for empiri- 
cal studies in many disciplines. Under the potential outcomes framework (Neyman, 1923[1990]; 
Rubin, 1974), the causal effects are defined as comparisons between the potential outcomes un- 
der treatment and control, averaged over a certain population of interest. One sufficient condition 
for nonparametric identification of the causal effects is the ignorability condition (Rosenbaum 
& Rubin, 1983), that the treatment is conditionally independent of the potential outcomes given 
those pretreatment covariates that confound the relationship between the treatment and outcome. 
To make this fundamental assumption as plausible as possible, many researchers suggest that 
the set of collected pretreatment covariates should be as rich as possible. It is often believed 
that “typically, the more conditional an assumption, the more generally acceptable it is” (Ru- 
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Fig. 1: Two Directed Acyclic Graphs. A is the treatment, and Y is the outcome of interest. 


bin, 2009), and therefore “in principle, there is little or no reason to avoid adjustment for a true 
covariate, a variable describing subjects before treatment” (Rosenbaum, 2002, pp. 76). 

Simply adjusting for all pretreatment covariates (d’ Agostino, 1998; Rosenbaum, 2002; Hirano 
& Imbens, 2001), or the pretreatment criterion (VanderWeele & Shpitser, 2011), has a sound jus- 
tification from the view point of design and analysis of randomized experiments. Cochran (1965), 
citing Dorn (1953), suggested that the planner of an observational study should always ask him- 
self the question, “How would the study be conducted if it were possible to do it by controlled 
experimentation?” Following this classical wisdom, Rubin (2007, 2008a,b, 2009) argued that the 
design of observational studies should be in parallel with the design of randomized experiments, 
i.e., because we balance all pretreatment covariates in randomized experiments, we should also 
follow this pretreatment criterion and balance or adjust for all pretreatment covariates when de- 
signing observational studies. 

However, this pretreatment criterion can result in increased bias under certain data generating 
processes. We highlight two important classes of such data generating processes for which the 
pretreatment criterion may be problematic. The first class is captured by an example of Green- 
land & Robins (1986), in which conditioning on a pretreatment covariate invalidates the ignora- 
bility assumption and thus a conditional analysis is biased; yet the ignorability assumption holds 
unconditionally, so an analysis that ignores the covariate is unbiased. Several researchers have 
shown that this phenomenon is generic when the data are generated under the causal diagram 
in Figure 1(a). In this diagram, the ignorability assumption holds unconditionally but not con- 
ditionally (Pearl, 2000; Spirtes et al., 2000; Greenland, 2003; Pearl, 2009; Shrier, 2008, 2009; 
Sj6lander, 2009; Ding & Miratrix, 2015). In Figure 1(a), a pretreatment covariate MM is associ- 
ated with two independent unmeasured covariates U and U’, but M does not itself affect either 
the treatment A or outcome Y. Because the corresponding causal diagram looks like the English 
letter M, this phenomenon is called M-Bias. 

The second class of processes, which constitute the subject of this paper, are represented by the 
causal diagram in Figure 1(b). Owing to confounding by the unmeasured common cause U of the 
treatment A and the outcome Y, both the analysis that adjusts and the analysis that fails to adjust 
for pretreatment measured covariates are biased. If the magnitude of the bias is larger when we 
adjust for a particular pretreatment covariate than when we do not, we refer to the covariate as a 
bias amplifier. Of particular interest is to determine the conditions under which an instrumental 
variable is a bias amplifier. An instrumental variables is a pretreamtnet covariate that is indepen- 
dent of the confounder U and has no direct effect on the outcome except through its effect on 
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the treatment. The variable Z in Figure 1(b) is an example. Heckman & Navarro-Lozano (2004) 
and Bhattacharya & Vogt (2012) showed numerically that when the treatment and outcome are 
confounded, adjusting for an instrumental variable can result in greater bias than the unadjusted 
estimator. Wooldridge theoretically demonstrated this in linear models in a technical report in 
2006, which was finally published as Wooldridge (2016). Because instrumental variables are 
often denoted by Z as in Figure 1(b), this phenomenon is called Z-Bias. 

The treatment assignment is a function of the instrumental variable, the unmeasured con- 
founder and some other independent random error, which are the three sources of variation of 
the treatment. If we adjust for the instrumental variable, the treatment variation is driven more by 
the unmeasured confounder, which could result in increased bias due to this confounder. Seem- 
ingly paradoxically, without adjusting for the instrumental variable, the observational study is 
more like a randomized experiment, and the bias due to confounding is smaller. Although ap- 
plied researchers (Myers et al., 2011; Walker, 2013; Brooks & Ohsfeldt, 2013; Ali et al., 2014) 
have confirmed through extensive simulation studies that this bias amplification phenomenon ex- 
ists in a wide range of reasonable models, definite theoretical results have been established only 
for linear models. We fill in this gap in the literature by showing that adjusting for an instrumental 
variable amplifies bias for estimating causal effects under a wide class of models satisfying cer- 
tain monotonicity assumptions. When the instrumental variable and the confounder have either 
no additive or no multiplicative interaction on the treatment, these assumptions can be interpreted 
as the signs of the arrows of the causal diagram (VanderWeele & Robins, 2010). However, we 
also show that there exist data generating processes under which an instrumental variable is not 
a bias amplifier. 


2. FRAMEWORK AND NOTATION 


We consider a binary treatment A, an instrumental variable Z, an unobserved confounder U, 
and an outcome Y, with the joint distribution depicted by the causal diagram in Figure 1(b). Let 
il denote conditional independence between random variables. Then the instrumental variable Z 
in Figure 1(b) satisfies ZILU, ZILY | (A,U) and Z4LA. We first discuss analysis conditional on 
observed pretreatment covariates X, and comment on averaging over X in 86 and the Supple- 
mentary Material. We define the potential outcomes of Y under treatment a as Y (a), (a = 1,0). 
The true average causal effect of A on Y for the population actually treated is 


ACE!” = E{Y(1) | A=1} — E{Y(0) | A=1}, 
for the population who are actually in the control condition it is 
ACE§” = E{Y(1) | A= 0} — E{Y(0) | A = 0}, 
and for the whole population it is 
ACE™® = E{Y(1)} — E{Y(0)}. 


Define ma(u) = E(Y | A=a,U = w) to be the conditional mean of the outcome given the 
treatment and confounder. As illustrated by Figure 1(b), because U suffices to control confound- 
ing between A and Y, the ignorability assumption ALY (a) | U holds for a = 0 and 1. Therefore, 
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according to Y = AY (1) + (1 — A)Y(0O), we have 


ACE = E(Y | A=1)- [ mow) F(au | A=), 
ACE@ = [mrt | A=0)-E(Y | A=0), 


ACE = il HACE Cae ij aay ECan), 
The unadjusted estimator is the naive comparison between the treatment and control means 
ACE™4 = B(Y | A=1)— E(Y | A=0). 


Define ia(z) = E(Y | A =a, Z = z) as the conditional mean of the outcome given the treat- 
ment and instrumental variable. Because the instrumental variable Z is also a pretreatment co- 
variate unaffected by the treatment, the usual strategy to adjust for all pretreatment covariates 
suggests using the adjusted estimator for the population under treatment 


ACE“! = E(Y | A=1)- [ware | A=1), 
for the population under control 

ACE*4 = [mere | A=0)-E(Y | A=0), 
and for the whole population 


AcE! =f in(2)F(@2) | ole) F(A2). 


Surprisingly, for linear structural equation models on (Z,U, A, Y), previous theory demon- 
strated that the magnitudes of the biases of the adjusted estimators are no smaller than the unad- 
justed ones (Pearl, 2010, 2011, 2013; Wooldridge, 2016). The goal of the rest of our paper is to 
show that this phenomenon exists in more general scenarios. 


3. SCALAR INSTRUMENTAL VARIABLE AND SCALAR CONFOUNDER 


We first give a theorem for a scalar instrumental variable Z and a scalar confounder U. 
THEOREM 1. In the causal diagram of Figure I(b) with scalar Z and U, if 


(a) pr(A =1| Z = z) is non-decreasing in z, pr(A = 1| U =u) is non-decreasing in u, and 
E(Y | A=a,U =u) is non-decreasing in u for both a = 0 and 1; 
(b) E(Y | A=a, Z = z) is non-increasing in z for both a = 0 and 1, 


then 
ACE*" ACE'*4i ACENe 
ACE | > | ACE™4 | > | ACEG* | . (1) 
AC E24 ACE"adj ACBEte 


Inequalities among vectors as in (1) should be interpreted as component-wise relationships. 
Intuitively, the monotonicity in Condition (a) of Theorem 1 requires non-negative dependence 
structures on arrows Z + A, U — AandU — Y in the causal diagram of Figure 1(b). Because 
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the dependence is in expectation, Condition (a) of Theorem 1 is weaker than the requirement of 
signed directed acyclic graphs (VanderWeele & Robins, 2010). 

The monotonicity in Condition (b) of Theorem 1 reflects the collider bias caused by condi- 
tioning on A. As noted by Greenland (2003), in many cases, if Z and U affect A in the same 
direction, then the collider bias caused by conditioning on A is often in the opposite direction. 
Lemmas S5-S8 in the Supplementary Material show that, if Z and U are independent and have 
non-negative additive or multiplicative effects on A, then conditioning on A results in negative 
association between Z and U. This negative collider bias, coupled with the positive association 
between U and Y, further implies negative association between Z and Y conditional on A as 
stated in Condition (b) of Theorem 1. 

For easy interpretation, we will give sufficient conditions for Z-Bias which require no interac- 
tion of Z and U on A. When A given Z and U follows an additive model, we have the following 
theorem. 


THEOREM 2. In the causal diagram of Figure 1(b) with scalar Z and U, (1) holds if 


(a) pr(A=1|Z=2,U =u) = B(z)+7(u); 

(b) 3(z) is non-decreasing in z, y(u) is non-decreasing in u, and E(Y | A=a,U = uw) is non- 
decreasing in u for both a = 1 and 0; 

(c) the essential supremum of U given (A = a, Z = z) depends only on a. 


In summary, when A given Z and U follows an additive model and monotonicity of Theorem 
2 holds, both unadjusted and adjusted estimators have non-negative biases for the true average 
causal effects for the treatment, control and the whole populations. Furthermore, the adjusted 
estimators, either for the treatment, control or the whole populations, have larger biases than the 
unadjusted estimator, i.e., Z-Bias arises. 

When both the instrumental variable Z and the confounder U are binary, Theorem 2 has an 
even more interpretable form. Define p.,, = pr(A =1| Z = z,U =u) for z,u = 0 and 1. 


COROLLARY 1. In the causal diagram of Figure I(b) with binary Z and U, (1) holds if 


(a) there is no additive interaction of Z and U on A, i.e., p11 — P10 — Por + Poo = 0; 
(b) Z and U have monotonic effects on A, i.e., pi, > Max(pi10, Po1) and min(pio, Po1) > Poo, 
and E(Y | A=a,U =1)> E(Y | A=a,U =0) for both a = 1 and 0. 
When A given Z and U follows an multiplicative model, we have the following theorem. 


THEOREM 3. In the causal diagram of Figure I(b) with scalar Z and U, (1) holds if we 
replace Condition (a) of Theorem 2 by 


(a’) pr(A=1| Z=2,U =u) = Blz)q(u). 


When both the instrument Z and the confounder U are binary, Theorem 3 can be simplified. 


COROLLARY 2. In the causal diagram of Figure 1(b) with binary Z and U, (1) holds if we 
replace Condition (a) of Corollary 1 by 


(a’) there is no multiplicative interaction of Z and U on A, i.é., ~11P00 = P10P01- 


We invoke the assumptions of no additive and multiplicative interaction of Z and U on A in 
Theorems 2 and 3 for easy interpretation. They are sufficient but not necessary conditions for 
Z-Bias. In fact, we show in the proofs that Conditions (a) and (a’) in Theorems 2 and 3 and 
Corollaries 1 and 2 can be replaced by weaker conditions. For the case with binary Z and U, 
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these conditions are particularly easy to interpret: 


P1POO — 1 (1 — pi1)(1 — poo) 
Propo = (1— pio)(1 — por) ~ 


(2) 


ie., Z and U have non-positive multiplicative interaction on both the presence and absence of 
A. Even if Condition (a) or (a’) does not hold, one can show that half of the parameter space 
of (p11, P10; P01; Poo) satisfies the weaker condition (2), which is only sufficient, not necessary. 
Therefore, even in the presence of additive or multiplicative interaction, Z-Bias arises in more 
than half of the parameter space for binary (Z,U, A, Y ). 


4. GENERAL INSTRUMENTAL VARIABLE AND GENERAL CONFOUNDER 


When the instrumental variable Z and the confounder U are vectors, Theorems 1-3 still hold if 
the monotonicity assumptions hold for each component of Z and U, and Z and U are multivari- 
ate totally positive of order two (Karlin & Rinott, 1980), including the case that the components 
of Z and U are mutually independent (Esary et al., 1967). A random vector W is multivari- 
ate totally positive of order two, if its density f(-) satisfies f{max(w1, w2)}f{min(wi, w2)} > 
f(wi)f (we), where max(w1, w2) and min(w1, w2) are component-wise maximum and mini- 
mum of the vectors w; and we. In the following, we will develop general theory for Z-Bias 
without the total positivity assumption about the components of Z and U. 

It is relatively straightforward to summarize a general instrumental variable Z by a scalar 
propensity score II = I(Z) = pr(A = 1 | Z), because ZILA | II(Z) as shown in Rosenbaum & 
Rubin (1983). We define v,(7) = E(Y | A =a,II = 7). The adjusted estimator for the popula- 
tion under treatment is 


ACE“! = E(Y | A=1)- [ort | A=1), 
the adjusted estimator for the population under control is 

ACE*4 — Jo (x)F(dx | A=0)- E(Y | A=0), 
and the adjusted estimator for the whole population is 


ACE*4 — [@rcan) - [etm Ftan. 


When Z is scalar, then the above three formulas reduce to the ones in Section 3. 

Greenland & Robins (1986) showed that for the causal effect on the treated population, Y (0) 
alone suffices to control for confounding; likewise, for the causal effect on the control popula- 
tion, Y(1) alone suffices to control for confounding. If interest lies in all three of our average 
causal effects, then we need to take U = {Y (1), Y(0)} as the ultimate confounder for the rela- 
tionship of A on Y. This is not an assumption about U. Because Y = AY (1) + (1 — A)Y(0) 
is a deterministic function of A and {Y(1), Y (0)}, this implies that U = {Y(1), Y (0)} satisfies 
the ignorability assumption (Rosenbaum & Rubin, 1983), or blocks all the back-door paths from 
Ato Y (Pearl, 1995, 2000). We represent the causal structure in Figure 2. 

We first state a theorem without assuming the structure of the causal diagram in Figure 2. 


THEOREM 4. If for both a =1 and 0, pr{A =1 | Y(a)} is non-decreasing in Y (a), and 
cov{II, va(II)} < 0, then (1) holds. 
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U ={Y(1), ¥(0)} 


Z 


II = I(Z) >A >Y 


Fig. 2: Directed Acyclic Graph for Z-Bias With General Instrument and Confounder 


In a randomized experiment A1|Y (a), so the dependence of pr{A = 1 | Y(a)} on Y(a) char- 
acterizes the self-selection process of an observational study. The condition cov{II, v_(II)} < 0 
in Theorem 4 is another measure of the collider-bias caused by conditioning on A, as v,(7) = 
E{Y(a) | A=a,IU=7} and Y(a) is a component of U in Figure 2. This measure of collider 
bias is more general than the one in Theorem 1. Analogous to Section 3, we will present more 
transparent sufficient conditions for Z-Bias to aid interpretation. 

In the following, we use the distributional association measure (Cox & Wermuth, 2003; Ma 
et al., 2006; Xie et al., 2008), i.e., random variable V has a non-negative distributional association 
on random variable W, if the conditional distribution satisfies OF (w | v)/Ov < 0 for all v and 
w. If the random variables are discrete, then partial differentiation is replaced by differencing 
between adjacent levels (Cox & Wermuth, 2003). 

If there is no additive interaction between II and {Y (1), Y(0)} on A, then we have the fol- 
lowing results. 


THEOREM 5. In the causal diagram of Figure 2, (1) holds if 


(a) pr(A=1|0,U) =1+ 6{Y(1)} + n{Y (0)} with 6(-) and n(-) being non-decreasing; 

(b) {Y (1), Y(0)} have non-negative distributional associations on each other, i.e., OF (y, | 
yo)/Oyo < O and OF (yo | y1)/Oy1 < O for all y; and yo; 

(c) the essential supremum of Y (1) given Y(0) does not depend on Y (0), and the essential 
supremum of Y (0) given Y (1) does not depend on Y (1). 


Remark 1. If we impose an additive model pr(A=1|0,U) =A(II)+06{Y(1)} + 
n{Y (0)}, then independence of II and U implies that pr(A = 1 | II) = A(T) + E[6{Y(1)}] + 
E|n{Y (0)}] = IL. Therefore, we must have h(II) = Land E[d{Y (1) }] + E[n{Y (0)}] =0. 


When the outcome is binary, the distributional association between Y (1) and Y (0) becomes 
their odds ratio (Xie et al., 2008), and non-negative distributional association between Y (1) and 
Y (0) is equivalent to 

pr{¥ (1) = 1,Y(0) = 1}pr{¥ (1) = 0, (0) = 0} 
= oil. 

pr{¥(1) = 1, Y(0) = 0}pr{¥(1) = 0, (0) = 1} ~ 
We can further relax the model assumption of A given II and U by allowing for non-negative 
interaction between Y (1) and Y (0) on A. 


ORy 


COROLLARY 3. In the causal diagram of Figure 2 with a binary outcome Y, (1) holds if 


(a) pp(A=1|T,U) =a+I14+6Y(1) + 7Y (0) + OY (1)Y (0) with 6,7, 0 > 0; 
(b) ORy > 1. 
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Remark 2. If we have an additive model of A given II and U, pr(A = 1| II,U) = A(II) + 
g(U), then the functional form g(U) =a+6Y(1)+7Y(0)+6Y(1)Y(0) imposes no re- 
striction for binary outcome. Furthermore, pr(A = 1 | II) = II implies that h(II) = II and 
E{g(U)} =0, ie., a= —dE{Y(1)} — nE{Y(0)} — OE{Y(1)Y (0)}. Therefore, the additive 
model in Condition (a) of Corollary 3 is 


pr(A = 1] 0,0) = 1+ 6[y(1) — E{Y(1)}] + nl¥ (0) — E{Y (0)}] + OY DY (0) — E{Y()Y (0)}]. 


If there is no multiplicative interaction of II and {Y (1), Y(0)} on Z, then we have the follow- 
ing results. 


THEOREM 6. In the causal diagram of Figure 2, (1) holds if we replace Condition (a) of 
Theorem 5 by 


(a’) pr(A = 1] II,U) = Hd{Y(1)}n{Y (0)} with 5(-) and n(-) being non-decreasing. 


COROLLARY 4. In the causal diagram of Figure 2 with a binary outcome Y, (1) holds if we 
replace Condition (a) of Corollary 3 by 


(a’) pr(A = 1] IL,U) = alld¥ Yn¥ OGY OYO) with 5,n,0 > 1. 


5. ILLUSTRATIONS 


5-1. Numerical Examples 
Myers et al. (2011) simulated binary (7, U, A, Y) to investigate Z-Bias. They generated (Z, U) 
according to pr(Z = 1) = 0.5 and pr(U = 1) = yo. The first set of their generative models is 
additive, 


pr(A=1|U,Z)=a9+aiU+a2Z, pr(Y =1|U,A)=Bot+fiU+hrA, (3) 
where the coefficients are all positive. The second set of their generative models is multiplicative, 
pr(A=1|U,Z) = apology, pr(¥Y =1| U, A) = SoBy By, (4) 


where the coefficients in (3) and (4) are all positive. They use simulation to show that Z-Bias 
arises under these models. In fact, in the above models, Z and U have monotonic effects on A 
without additive or multiplicative interactions, and U acts monotonically on Y, given A. There- 
fore, Corollaries 1 and 2 imply that Z-Bias must occur. The qualitative conclusion follows imme- 
diately from our theory. However, our theory does not make statements about the magnitude of 
the bias, and for more details about the magnitude and finite sample properties, see Myers et al. 
(2011). 

We further use three numerical examples to illustrate the role of the no-interaction assump- 
tions required by Theorems 2 and 3 and Corollaries 1 and 2. Recall the conditional probability of 
the treatment A, p-., = pr(A = 1| Z = z,U = u), and define the conditional probabilities of the 
outcome Y as ran = pr(Y =1| A=a,U =u), for z,a,u = 0, 1. Table 1 gives three examples, 
where monotonicity on the conditional distributions of A and Y hold, and there are both additive 
and multiplicative interactions. In all cases, the instrumental variable Z is Bernoulli(p = 0.5), 
and the confounder U is another independent Bernoulli(a = 0.5). In Case 1, the weaker con- 
dition (2) holds, and our theory implies that Z-Bias arises. In Case 2, neither the condition in 
Theorem 1 or (2) holds, but Z-Bias still arises. Our conditions are only sufficient but not neces- 
sary. In Case 3, neither the condition in Theorem 1 or (2) holds, and Z-Bias does not arise. 

Finally, for binary (Z,U, A, Y) we use Monte Carlo to compute the volume of the Z-Bias 
space, i.e., the parameter space of p, 77, pz,’S and rq,,’s in which the adjusted estimator has higher 
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Table 1: Examples for the presence and absence of Z-Bias, in which Z ~ Bernoulli(0.5), U ~ 
Bernoulli(0.5), the conditional probability of the treatment A is p2, = pr(A =1| Z = z, 


u), and the conditional probability of the outcome Y is ra, = pr(Y =1| A=a,U =u). 


U 
U 


Case P11 P10 =6Pol Poo Tid. T10 TO1 T0oO ACE"e ACE" ACE*4 Z-Bias 


1 0.8 06 02 O01 0.08 0.06 0.02 0.01 0.0550 0.0574 0.0584 YES 
2 0.3 02 03 O1 0.03 0.02 0.03 0.01 0.0050 0.0076 0.0077 YES 
3 05 04 04 O1 0.04 0.04 0.04 0.01 0.0150 0.0173 = 0.0172 NO 
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bias of the adjusted estimator 


Fig. 3: Biases of the adjusted and unadjusted estimators over 10° draws of the probabilities. In 
areas (Z1, Z2, Z3, Z4) Z-Bias arises, and in areas (Z1, Z2, 73, Z4) Z-Bias does not arise. 


bias than the unadjusted estimator. We randomly draw these ten probabilities from independent 
Uniform(0, 1) random variables, and for each draw of these probabilities we compute the average 
causal effect ACE, the unadjusted estimator ACE"™4i and the adjusted estimator ACE*4, 
We plot the joint values of the biases (ACE*4) — ACE™*®, ACE"™"4i — ACE™®) in Figure 3. 
The volume of the Z-Bias space can be approximated by the frequency that ACE*4 deviates 
more from ACE"™® than ACE""*4i, With 10° random draws, our Monte Carlo gives an unbiased 
estimate for this volume as 0.6805 with estimated standard error 0.0005. Therefore, in about 
68% of the parameter space, the adjusted estimator is more biased than the unadjusted estimator. 


5:2. Real Data Examples 


Bhattacharya & Vogt (2012) presented an example about the treatment effect of small class- 
room in the third grade on test scores for reading. Their instrumental variable analysis gave point 
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Table 2: The example from Wooldridge (2010). 


point estimate standarderror lower confidence limit upper confidence limit 


ACE™ 2.47 0.59 1.31 3.62 
ACE™4 177 0.07 1.64 1.90 
ACE*4 1.76 0.07 1.64 1.89 
U 
Z >A >VY 


SSeS 


Fig. 4: Directed Acyclic Graph for Z-Bias Allowing for an Arrow from Z to Y 


estimate 8.73 with standard error 2.01. Without adjusting for the instrumental variable in the 
propensity score model, the point estimate was 6.00 with estimated standard error 1.34; adjust- 
ing for the instrumental variable, the point estimate was 2.97 with estimated standard error 1.84. 
The difference between the adjusted estimator and the instrumental variable estimator is larger 
than that between the unadjusted estimator and the instrumental variable estimator. 

Wooldridge (2010, Example 21.3) discusses estimating the effect of attaining at least seven 
years of education on fertility, with treatment A being a binary indicator for at least seven years 
of education, outcome Y being the number of living children, and instrumental variable Z being 
a binary indicator if the woman was born in the first half of the year. Although the original data 
set of Wooldridge (2010) contains other variables, most of them are posttreatment variables, so 
we do not adjust for them in our analysis. The instrumental variable analysis gives point estimate 
2.47 with estimated standard error 0.59. The unadjusted analysis gives point estimate 1.77 with 
estimated standard error 0.07. The adjusted analysis gives point estimate 1.76 with estimated 
standard error 0.07. Table 2 summarizes the results. In this example, the adjusted and unadjusted 
estimators give similar results. 


6. DISCUSSION 
6-1. Allowing for an Arrow from Z to Y 


When the variable Z has an arrow to the outcome Y as illustrated by Figure 4, the following 
generalization of Theorem 1 holds. 


THEOREM 7. Consider the causal diagram of Figure 4 with scalar Z and U, where Z\1LU 
and A\lY (a) | (Z,U) for a=0 and 1. The result in (1) holds if we replace Condition (a) of 
Theorem | by 


(a’?) pr(A=1| Z=2,U =u) and E(Y | A=a,Z =2z,U =) are non-decreasing in z and 
u for a = O and 1. 


However, when there is an arrow from Z to Y, Theorem 7 is of little use in practice without 
strong substantive knowledge about the size of the direct effect of Z on Y. In particular, neither 
Theorem 2 nor Theorem 3 is true when an arrow from Z to Y is present. This reflects the 
fact that neither the absence of an additive nor the absence of a multiplicative interaction of Z 
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and U on A is sufficient to conclude that E(Y | A =a, Z = z) is non-increasing in z when 
E(Y | A=a,U =u, Z = z) is non-decreasing in z and wu. 

With a general instrumental variable and a general confounder, Theorem 4 holds without any 
assumptions on the underlying causal diagram, and therefore it holds even if the variable Z 
affects the outcome directly. However, Theorems 5 and 6 no longer hold if an arrow from Z to 
Y is present as in Figure 4. This reflects the fact that the absence of an additive or multiplicative 
interaction of U and II on A no longer implies cov{II, vq(II)} < 0 when Z has a direct effect 
on Y, even if the remaining conditions of Theorems 5 and 6 hold. Analogously, Theorems 5 and 
6 no longer hold if there exits an unmeasured common cause of Z and Y on the causal diagram 
in Figure 1(b), even if Z has no direct effect on Y. 


6-2. Extensions 

In §§2-4, we discussed Z-Bias for the average causal effects. We can extend the results to 
distributional causal effects for general outcomes (Ju & Geng, 2010) and causal risk ratios for 
binary or positive outcomes. Moreover, the results in §§2—4 are conditional on or within the strata 
of observed covariates. Similar results hold for causal effects averaged over observed covariates. 
We give more details in the Supplementary Material. In this paper we have given sufficient condi- 
tions for the presence of Z-Bias; future work could consider sufficient conditions for the absence 
of Z-Bias. 


6-3. Conclusion 

It is often suggested that we should adjust for all pretreatment covariates in observational stud- 
ies. However, we show that in a wide class of models satisfying certain monotonicity, adjusting 
for an instrumental variable actually amplifies the impact of the unmeasured treatment-outcome 
confounding, which results in more bias than the unadjusted estimator. In practice, we may not 
be sure about whether a covariate is a confounder, for which one needs to control, or perhaps 
instead an instrumental variable, for which control would only increase any existing bias due 
to unmeasured confounding. Therefore, a more practical approach, as suggested by Rosenbaum 
(2010, Chapter 18.2) and Brookhart et al. (2010), may be to conduct analysis both with and with- 
out adjusting for the covariate. If two analyses give similar results, as in the example in Table 2, 
then we need not worry about Z-Bias; otherwise, we need additional information and analysis 
before making decisions. 
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Supplementary material for “Instrumental variables as bias 
amplifiers with general outcome and confounding” 


APPENDIX 1. LEMMAS AND THEIR PROOFS 


In order to prove the main results, we need to invoke the following lemmas. Some of them are 
from the literature, and some of them are new and of independent interest. 
Lemma S1 is from Esary et al. (1967, Theorem 2.1). 


LEMMA S1. Let f(-) and g(-) be functions with K real-valued arguments, which are both 
non-decreasing in each of their arguments. If U = (U1, ..., UK) is a multivariate random vari- 
able with K mutually independent components, then cov{ f (U), g(U)} > 0. 


Lemma S2 is from VanderWeele (2008), and Lemmas S3 and S4 are from Chiba (2009). 


LEMMA 82. For a univariate U or a multivariate U with mutually independent components, 
if fora=1and0, Y(a)ILA|U, E(Y | A=a,U =u) is non-decreasing in each component 
of u, and pr(A = 1| U = w) is non-decreasing in each component of u, then E(Y | A= 1) > 
E{Y(1)} and E(Y | A=0) < E{Y(0)}. 


LEMMA 83. Fora univariate U and a multivariate U with mutually independent components, 
if Y(O)ILA|U, E(Y | A=0,U =u) is non-decreasing in each component of u, and pr(A = 
1 | U = w) is non-decreasing in each component of u, then E(Y | A= 0) < E{Y(0)| A= 1}. 


LEMMA S4. Fora univariate U and a multivariate U with mutually independent components, 
if Y(A)ILA|U, E(Y | A=1,U =u) is non-decreasing in each component of u, and pr(A = 
1 | U = w) is non-decreasing in each component of u, then E(Y | A= 1) > E{Y(1) | A =O}. 


Lemma S5, extending Rothman et al. (2008), states that under monotonicity, no additive in- 
teraction implies non-positive multiplicative interactions for both presence and absence of the 
outcome. 


LEMMA SS. /f p11 > max(pio0, Poi), Min(p10, p01) > Poo > 0, and p11 — pio — Poi + Poo = 
0, then 


PLULPOO — | (1 — pi1)(1 — poo) 
Propo. | (1—>pr0o)(1 — por) 


<1. (S5) 


Proof of Lemma S5. Define RR = p11/Poo > 1, RRio = P10/Poo > 1 and RRoi = 
poi/poo > 1. Then pii — pio — po1 + Poo = 9 implies RR; = RRio + RRoi — 1, which 
further implies 


RR 1 
P11P00 _ = 14+ —__(RRj1 — RRioRRo1) 
PioPor  RRioRRo1 RR1oRRo1 
1 
3] sh = RR RRo; — 1 — RRipRR 
+ RRRRo ( 190 + Ol 10 01) 
1 


~ RRioRRo1 
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The second inequality of (S5) follows from 


(1 ~ pur) ~ poo) _ , , (i ~pis)(1 = poo) — = pio) (1 = pos) 
(1 — pio)(1 — por) (1 — pio)(1 — por) 
1 
=1 {(1 — pu — poo + p1rpoo) — (1 — Pio — Poi + Piopor)} 
(1 — pio) (1 — por) 
1 : ) 
= P11P00 — P10P01 
(1 — pio)(1 — por) 
ae P10P01 ee 1) ae 
(1 — pio)(1 — por) P10P01 


Lemma S5 is about interaction between two binary causes, and for our discussion we need to 
extend it to interaction between two general causes. Lemma S6 extends Piegorsch et al. (1994) 
and Yang et al. (1999) by relating the conditional association between two independent causes 
given the outcome to the interaction between the two causes on the outcome. 


LEMMA S6. If ZILU, and pr(A=1| Z=2,U =u) = B(z) + y(u) with B(z) and y(u) 
non-decreasing in z and u, then for both a = 1 and 0 and for all values of u and z, 


OF(u| A=a,Z =z) > 0, 
Oz — 


i.e., U has non-positive distributional dependence on Z, given A. 


Proof of Lemma S6. For a fixed u and 2; > zg, we define 
oi =p Sa i ” (B(a) + yu) }F (dw!) /{1 — F(u)}, 
ee ea are one i ” (B(20) + 1(u)}F(du!)/{1 — F(u)}, 


pra pa LU a2 Hs) S {8(z1) + y(u)}F(du’)/F(u), 


po =prA=1|U<4u,2=%) = {8(z0) + y(u’) }F(du’)/F(u), 


following from the additive model of A and Z1LU. 
Because 3(z1) > 6(Z0), it is straightforward to show that p11 > pio and poi > poo. Because 


(wu) is increasing in u, we have 
pu = Blai)+r7(u), pio 2 B(zo)+7(u), por S$ Ble) + (u), Poo S B(z0) + Yu), 


which imply p11 > po, and pi9 > poo. We further have 


P11 — P10 — Poi + Poo 


= [ ee — B(z0)}F(du')/{1 — F(u)} — a {8(a1) — B(20)}F du’) /F(u) 
=0 
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The four probabilities (p11, P10, P01, Poo) satisfy the conditions in Lemma S5, Therefore, (2) 
holds. Replacing the probabilities in (2) by their definitions above, we have 


pr(A=1 U>u,Z=a)p(A=1|UsuZ=%) _, 
pr(iA=1|U>4u,Z=z)pr(A=1|U<u,Z=2) — 

a MASI Ue) RN azo) 
pr(iA=1|U <u,21)~ pr(A=1|U <u, 20) 

and 

pr(A = 0 U>u,Z=a)pr(A=0|U Su,Z= 2%) —, 
pr(A=0|U>u,Z=20)pr(A=0|U <u,Z=21) ~ 
pr(A = 0 U>u,zi) — pr(A=0|U > u, 20) 
pr(A=0|U <u,2)~ pr(A=0|U <u, 2) 


Therefore, for both a = 1 and 0 and for all values of wu, 


pr(A=a|U>u,Z =z) 
priAja)U<u,Z7 =z) 


(S6) 


is non-increasing in z. Because of the independence of Z and U, we have 


a| A=a 7 =) 

_ pr(U <u,A=a|Z=2z) 

7 pr(A=a|Z=z) 

_ pr(U < u)pr(A=a|U <u,Z =z) 

~ pr(U < u)pr(A=a|U <u,Z=2z)+pr(U > u)pr(A=a|U>u,Z =z) 
2 ~pr(U >u) | pr(A=a|U >u,Z =z) “% 

= Be) 


lA 


Therefore, F'(u | A = a, Z = z) is anon-increasing function of (S6), and the conclusion holds. 


Lemmas S5 and S6 above hold under the assumption of no additive interaction, and the fol- 
lowing two lemmas state similar results under the assumption of no multiplicative interaction. 


LEMMA S7. If pi; > max(pio, poi), Min(pi0, po1) > poo, and p1ipoo = PioPpos, then 


(1 — pi1)(1 — poo) 2 
(b= pin) (> pan): — 


Proof of Lemma S7. Using the same notation in the proof of Lemma S5, p11~00 = Pio~o1 im- 
plies RR = RR oRRo}1, with RRjo > 1, RRo1 > 1, and RRqy > 1. Therefore, 


P11 — Pio — Poi + Poo = poo(RRioRRoi — RRio — RRoi + 1) = poo(RRio — 1)(RRoi — 1) > 0, 


P11 — Pio — Poi + Poo = 9, 


which further implies that 


(1— pu). — poo) _ 1 
(1 pro)(1— por) ig (1 — pio)(1 — por) ul 
_ 1 — Pil Pio ~ Poi + Poo 

(1 — pio)(1 — por) 


pii)(1 — poo) — (1 — pio)(1 — por) 


LEMMA S8. Jf ZILU, 
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u) 


B(z)y(u) with B(z)>0 and 


(wu) > 0 non-decreasing in z and u, then ZILU | A = 1, and for all values of u and z, 


OF (u|A=0,Z =z) 
Oz 


= 0, 


i.e., U has non-positive distributional dependence on Z, given A = 0. 


Proof of Lemma S8. For a fixed u and z; > zo, we define 


pu =pr(A=1 
Pio = pr(A=1 
po = pr(A=1 
Poo = pr(A=1 


(oe) 


U >u,Z=n)= Bla) f 


U 
(oe) 


U > uZ =) = Ble) | 


U 
U 


U <u,Z=x)=6la) | 


U <u,Z =) = Ble) | 


—oo 


y(u)F(du’)/{1 — F(u)}, 
yu) F(du’)/{1 — F(u)}, 
y(u)F(du’)/F(u), 


yu) F(du’)/F(u), 


following from the multiplicative model of A and Z1LU. Because 6(z1) > (zo), we have p11 > 
Pio and poi > poo. Because (wu) is increasing in u, we have 


pu = B(z1)y(u), 


Pio = B(z0)y(u), 


poi < B(z1)7(u), 


Poo < B(z0)7(u), 


which imply p11 > poi and pio > poo. We can further verify (p11po00)/(piopo1) = 1. Because 
the four probabilities (p11, p10, p01, Poo) satisfy the conditions in Lemma $7, we have {(1 — 
pi1)(1 — poo) $}/{( — pio) (1 — poi)} < 1. Replacing the probabilities by their definitions, we 


have 


p(A=1|U>u,Z=a)pr(A=1|U suZ=~%) _ | 
pr(A=1|U>u,Z7 = z)pr(A=1|U<uZ=%) 
pr(A=0|U >u,Z=xa)pr(A=0|U su,Z=%) | 
pr(A=0|U>u,Z = 29)pr(A=0|U <u,Z=2%)7 


Following the same logic of the proof of Lemma S6, we can prove that ZILU | A = 1, and Z 


has non-positive distributional association on U, given A = 0. 


Define f = pr(A = 1) to be the proportion of the population under treatment. The average 
causal effect for the whole population can be written as a convex combination of the average 
causal effects for the treated and control populations: 


ACE™® = E{Y(1)} — E{Y(0)} = fACEY" + (1 — f)ACEG. 


Analogously, with a scalar instrumental variable, the adjusted estimator for the whole population 


can be written as 


ACES! =f in(2\F(d2) — [ wo(e)F(d2) = FACES + (1 AACE, 


and with a general instrumental variable, 


ACE“! = / vim) F (dx) = i vom) F (de) = FACE + (1— f)ACER. 
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LEMMA S9. With a scalar instrumental variable Z, the differences between the adjusted and 
unadjusted estimators are 


adj unadj __ cov{II(Z), o(Z)} 

ACE“! — ACE ee 

adj unadj __ cov{](Z ,pa(Z)} 

ACE*! — ACE™4 = igh) 
AcE — acpi — _cOv{I1(Z),so(Z)} cov TI(Z), 4u(Z)} 
Leap f , 


With a general instrumental variable Z, the above formulas hold if we replace II(Z) by II and 
Ma(Z) = E(Y | A=a,Z) byv, (Il) = E(Y | A=a, ID). 


Proof of Lemma S9. The difference ACE“ — ACE"*4i is equal to 
ACE! — ACE"™*4 
= EY |A=0)- f po(e)F(dz|4=1) 


= f wol2)F(dz|A=0)— f po(2)F(de | A=1) 
_ fuo(2}{1—M(2)}F(d2) __f wo(2)I(2) F(z) 


1-—f f 
_ 7h [2 {io (Z)(1 — (Z)) }E{TM(Z)} — B{yuo(Z)UU(Z)} EA — 11(Z)} 
~ en [B{uo(Z)}E{11(Z)} — Ef{pio(Z)I(Z)} 
_ _cov{II(Z), 10(Z)} 
pu f) 


Similarly, the difference ACE? — ACE"™4i is equal to 
ACE*! — ACE! = HOPG ALS 0= i] ira (A= 1) 
cov{II(Z), ui(Z)} 
plea) 
Therefore, the difference ACE*! — ACE"*4 is equal to 
ACE*4 — ACE*™*4 — f(ACE*Y — ACE'™4) + (1 — f)(ACE! — ACE'™4i) 

cov{II(Z), wo(Z)} — cov{(Z), wa (Z)} 

Pea f 


Analogously, we can prove the results for general instrumental variables. 


APPENDIX 2. PROOFS OF THEOREMS AND COROLLARIES IN THE MAIN TEXT 
Proof of Theorem |. Because II(z) = pr(A =1| Z =z) and pr(A = 1| U =u) are non- 
decreasing in z and u, and E(Y | A=a,U =u) is non-decreasing in u for both a = 0 and 
1, the unadjusted estimator, ACE" 4i | is larger than or equal to ACE", ACEY"* and ACE5"*, 
according to Lemmas $2-S4. 
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Because II(Z) is non-decreasing and j1(Z) is non-increasing in Z for both a = 0 and 1, their 
covariance is non-positive according to Lemma S1, ie., cov{II(Z), fta(Z)} < 0. 

Because the differences between all the adjusted estimators, ACE*", ACE*4 and ACE?4, 
and the unadjusted estimator, ACE""“4), are negative constants multiplied by cov{II(Z), ua(Z)}, 
according to Lemma S9 all of ACE*, ACE*4 , and ACE*4 are larger or equal to ACE", 


Proof of Theorem 2. The independence of Z and U implies that 


prA=1|Z=2)= f p(A=1|Z=2,0 =u)F(du) = Ble) + EWU}, 


p(A=1|0=n)= [ona 1|Z=2,U =u)F(dz) = E{6(Z)} + 7(u) 


are non-decreasing in z and u. Therefore, according to Theorem 1 we need only to verify that 
E(Y | A=a,Z = z) in non-increasing in z for both a = 0 and 1. 

Because Z1|U and pr(A =1| Z = z,U =u) = 8(z) + 7(u) with non-decreasing 6(z) and 
(u), we can apply Lemma S6, and conclude that OF (u | A = a, Z = z)/0z > 0. 

Write the essential infimum and supremum of U given (A = a, Z = z) as u(a, z) and U(a), 
with the later depending only on a according to Condition (c) of Theorem 2. Because Y || Z | 
(A, U), integration or summation by parts gives 


EY | Aja; Z.=2) 


= f EA aT aia waa ZED 


- J malu) F(a | A=a,Z=2) 


=ma(u)F(u| A=a,Z = 2), — ie ee F(u| A=a,Z =z) du 


O 
= ma{U(a)} — / ee F(u| A=a,Z =z)du. 
Ou 
Therefore, its derivative with respect to z, 


OE(Y |A=a,z) _ | eee 


De as Fu bru A=az=2)du 


_ ft) ened =e 


is smaller than or equal to zero, because Om,(u)/Ou > 0 for both a = 0 and 1 and for all wu. 


Proof of Corollary 1. According to Theorem | we need only to verify that u,(z) = E(Y 


A =a, Z = z) is non-increasing in z for both a = 0 and 1. Following Lemma S5, for binary and 
independent Z and U, monotonicity and no additive interaction imply (S5), which, according to 
Bayes’ Theorem, is equivalent to 


pr(A =1| Z = 1,U = 1)pr( 
pr(A = 1| Z=1,U = 0)pr( 
pr(A=0|Z=1,U =1)pr(A=0| Z=0,U =0) 

= OR aise, S8 
pr(A=0|Z=1,U =0)pr(A=0| Z=0,U =1) Zu\|A=0 S (S8) 


1|Z=0,U =0) 
= OR aed 7 
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The above inequalities (S7) and (S8) state that Z and U have negative association given each 
level of A, and therefore pr(U = 1| A =a, Z = z) is non-increasing in z for both a = 1 and 0. 
Because mq(1) > maq(0) and 


fhe = EY |A=a7 =e) 
=S > E(V|A i2= 20 >So) =| Aa 2) 
u=0,1 
= ma(1)pr(U =1| A=a,Z =z) + m,.(0){1-pr(U =1| A=a,Z =2z)} 
= {ma(1) — m,(0)}pr(U =1| A=a,Z = z) +m,(0), 


we know that /iq(z) is non-decreasing in pr(U =1| A =a, Z = z). Therefore, uq(z) is non- 
increasing in z for both a = 1 and 0. 


Proof of Theorem 3. Because of the independence of Z and U, we have pr(A = 1| Z =z) = 
B(z)E{y(U)} and pr(A = 1| U =u) = E{8(Z)} (wu) are non-decreasing in z and u. Accord- 
ing to Lemma S8, the multiplicative model of A also implies that for both a = 1 and 0 and for 
all z and u, OF (u | A =a, Z = z)/0z > 0. Following exactly the same steps of the proof of 
Theorem 2, we can prove Theorem 3. 


Proof of Corollary 2. For binary and independent Z and U, monotonicity, no multiplicative 
interaction, and Lemma S7 imply 


1—- 1— 
PUP _ 4 <1 (1 — pi1)(1 — poo) 


, < S9 
P10P01 = (1 — pio)(1 — por) ws 


With the above results in (S9), the rest of the proof is the same as the proof of Corollary 1. 


Proof of Theorem 4. First, we consider the treatment effect on the population under treat- 
ment. Taking U = Y(0) in Lemma $3, we have ACE"! > ACE‘, because ALY (0) | 
Y (0), pr{A = 1| Y(0)} is non-decreasing in Y(0), and E{Y | A =0,Y(0)} = Y(0) is non- 
decreasing in Y(0). The condition cov{II, E(Y | A=0,I)} <0 implies that ACE*! = 
ACE"™4 according to Lemma S9. Therefore, ACE* > ACE"™4 > ACE{”. 

Second, we take U = Y(1) in Lemma S4, and by a similar argument as above we have 
ACES > ACES ACE: 

The conclusion holds because ACE™* = fACEt® + (1— f)ACE@® and ACE4 = 
fACE* + (1 — f)ACE, 


Proof of Theorem 5. Under the additive model of A given II and U = {Y (1), Y(0)}, we have 
the following results. First, pr(A = 1 | I1) = II is increasing in II. Second, ILL.{Y (1), Y(0)} 
implies 

pA =1|IL¥()=sn} = f pr(A = 1) 1,U)F(ao |) 
= f+ 5(on) + ml)} Fay | v0) 
= 11+ 5(u1) + f nyo) F(a | vs) = 11+ 5(an). 


Denote the infimum and supremum of Y (0) given Y(1) = yi by y, (yi) and Yo, with the later not 
depending on y; according to Condition (c) of Theorem 5. Applying integration or summation 
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by parts, we have 


Flan) = btn) + nun) Flow Ln) — ff IL | on) a 
= d(y1) + Yo) [{} rw | v1) dyo. 


The function (y;) is non-decreasing in y;, because 


25) _ aon) _ f {dt} {2840 2) ag > 0 


dy1 dy dyo Oy1 


Third, following the same reasoning as the second argument, we have pr{ A = 1 | I, Y(1) = 
yo} = 11+ 7(yo), with 7(yo) being a non-decreasing function of yo. Fourth, IILY (1) implies 
pr{A=1|Y(1) =} =f +46(y1), which is non-decreasing in y;. Fifth, ILLLY(0) implies 
pr{A =1| Y(0) = yo} = f + (yo), which is non-decreasing in yo. 

According the fourth and fifth arguments above, Condition (a) in Theorem 4 holds. Therefore, 
we need only to verify Condition (b) in Theorem 4 to complete the proof. 

We have shown that pr{A = 1|II,Y(1)} =11+6{Y(1)}, which is additive and non- 
decreasing in II and Y (1). According to Lemma S6, we know that 


Opr{Y(1) <y | A=1,1=7} 
On 


> 0 (S10) 


for all y; and 7. We have also shown that pr{ A = 1 | II, Y(0)} = Il + 7{Y(0)}, which is addi- 
tive and non-decreasing in II and Y (0). Again according to Lemma S6, we know that 


opr{¥ (0) <0 |A=0,= 7} 


an (S11) 


for all yo and z. According to Xie et al. (2008), the above negative distributional associations in 
(S10) and (S11) imply the negative associations in expectation between Y (0) and II given A, as 
required by condition (b) of Theorem 4. 


Proof of Corollary 3. As shown in the proof of Theorem 5, the conclusion follows immedi- 
ately from the five ingredients. We will show that they hold even if there is non-negative interac- 
tion between binary Y (1) and Y (0). The following proof is in parallel with the proof of Theorem 
5. 

First, pr(A = 1 | I) = I is increasing in II. Second, 


p{A=1|1,Y() =m} 
= E[pr{A=1|/ILYQ)=m,Y(0)} |WYQ) =y1] 
= B{a+Il+ dy, + nY (0) + 6y,Y (0) | 1, Y(1) = y} 
=a ++ dy +npr{¥ (0) =1/¥(1) = yi} + Ompr{¥ (0) =1/ ¥() =m} (S12) 
=I + df — E{Y(1)}). ($13) 
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The last equation in (S13) follows from the fact that Y (1) is binary and the functional form must 
be linear in y;, where the coefficient is 


6=pr{A=1|1,Y(1) =1}—pr{A=1] 0, Y(1) =0} 
= 6+ nlpr{Y (0) =1] YQ) = 1} — prt YO) =1] YC) = 0}] + Opr{¥ (0) =1 | YC) = 1} 
(S14) 
= nipr{Y (0) = 1] ¥(1) = 1} —pr{Y(0) =1] Y(1) = O}], (S15) 
where (S14) follows from (S12), and (S15) follows from 6 > 0 and 6 > 0. Because ORy > 1, 
the potential outcomes have non-negative association, implying that their risk difference RDy = 
pr{Y (0) =1| Y(1) = 1} —pr{Y(0) = 1] Y(1) = 0} > O. Therefore, 6 > 0, and pr{A = 1 | 
II, Y (1)} is additive and non-decreasing in II and Y (1). 

Third, similar to the second argument, we have pr{A = 1 | H, Y(0) = yo} = 11 + 7[yo — 
E{Y(0)}] with 7 > 0. Therefore, pr{A = 1 | II, Y(0)} is additive and non-decreasing in I 
and Y (0). Fourth, IL || Y(1) implies that pr{A = 1 | Y(1)} = f + 6Y (1) is increasing in Y(1). 
Fifth, ILILY (0) implies that pr{A = 1 | Y(0)} = f + 7Y (0) is increasing in Y (0). 

With these five ingredients, the rest of the proof is exactly the same as the proof of Theorem 


5. 
Proof of Theorem 6. First, pr(A = 1 | II) = II is non-decreasing in II. Second, 


pr{A = 1] 1,¥(1) =m} = 16(m) / 6(yo)F(dyo | yx) = H5(y1) 


is multiplicative and non-decreasing in II and y;, following the same argument as the proof of 
Theorem 5. Third, pr{A = 1 | II, Y(0) = yo} = H7(yo) is multiplicative and non-decreasing in 
II and yo. Fourth, pr{A = 1 | Y(1) = y:} = f6(y1) is non-decreasing in y;. Fifth, pr{A = 1 | 
Y (0) = yo} = fn(yo) is non-decreasing in yo. 
The multiplicative models and Lemma S8 imply that for all 7, yi and yo, 
Opr{Y(1) Sy | A=1,0=7} Opr{Y (0) < yo | A= 0,11 = w} 
On On 


The rest part is the same as the proof of Theorem 5. 


=0<0, 


> 0.(S16) 


Proof of Corollary 4. First, pr(A = 1 | II) = IL is non-decreasing in II. Second, 

pr{A =1| 1, Y(1) = m1} = alld E{nY OGY | y(1) = yy} = alld, 
where the functional form must be multiplicative because of binary Y (0), and the parameter dis 
_ pr{A = 1] TY (1) = 1} 
— pr{A=1| I, Y¥(1) = 0} 

E{n¥ OO | ¥(1) = 1 

E{nY © | Y(1) = 0} 

— gy NOprtY (0) = 1 | ¥(1) = 1} + pr{¥(0) = 0] YA) = 1} 
mpr{Y (0) = 1| Y(1) = 0} + pr{¥(0) = 0| Y(1) = 0} 
(7 — Vpr{¥ (0) =1] YQ) =1} +1 
(7 — 1)pr{Y(0) =1| YQ) = 0} +10 
Because ORy > 1, we have pr{Y (0) = 1| Y(1) = 1} => pr{Y(0) = 1] Y(1) = 0}, which im- 
plies that 6 > 1. Therefore, pr{A = 1| II, Y(1)} is multiplicative and non-decreasing in II 


5 


=dx 
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and Y(1). Third, we can similarly show that pr{ A = 1 | II, Y(0)} is multiplicative and non- 

decreasing in II and Y(0). Fourth, pr{A = 1| Y(1) = y1} = af" is non-decreasing in yj. 

Fifth, pr{A = 1 | Y(0) = yo} = af77¥ is non-decreasing in yo. 
The rest part is the same as the proof of Theorem 6. 


Proof of Theorem 7.. In Figure 4, Z and U are two independent confounders for the relation- 
ship between A and Y. Because pr(A = 1| Z = z,U =u) and E(Y | A=a,Z =2z,U =u) 
are non-decreasing in z and u for both a = 0 and 1, Lemmas S2—S4 imply that the unadjusted 
estimator, ACE", is larger than or equal to ACE™*, ACE" and ACE", 

The independence between Z and U implies pr(A = 1| Z = z) = f pr(A=1|Z=2,U 
u)F (du), and the monotonicity of pr(A = 1| Z = z,U =u) in z implies that pr(A = 1 | Z = 
z) is non-decreasing in z. The rest of the proof is identical to the proof of Theorem 1. 


APPENDIX 3. EXTENSIONS TO OTHER CAUSAL MEASURES 
Appendix 3-1. Distributional Causal Effects 
Sometimes we are also interested in estimating the distributional causal effects (Ju & Geng, 


2010) for the treatment, control and whole populations: 
DCE;""(y) = pr{¥(1) > y| A= 1}—pr{¥(0) >y| A= 1}, 
DCE9"(y) = pr{¥ (1) > y | A= 0} — pr{¥(0) > y | A = 0}, 
DCE"™*(y) = pr{Y (1) > y} — pr{¥ (0) > y}. 
The unadjusted estimator is 
DCE™4i(y) = pr(Y > y| A=1)—pr(Y >y| A=0). 


The adjusted estimators for the treatment, control and whole populations are 
DCE;"(y) = pr(¥ > y|A=1)- [mw >y|A=0,z)F(dz| A=), 
DcEs*(y) = f pry > y| A= 1,2)F(dz | A= 0) prl¥ > y| A=0), 


DCE“ (y) = [mw > y|A=1,z)F(dz) - [mw >y|A=0,z)F (dz). 


If the outcome is binary, then the distributional causal effects at y < 1 are the average causal 
effects, and zero at y > 1. All results about distributional causal effects reduce to average causal 
effects for binary outcome. For a general outcome, the distributional causal effects are the av- 
erage causal effects on the dichotomized outcome I, = I(Y > y). Therefore, if we replace the 
outcome Y by J, in Theorems 1-3, the results about Z-Bias hold for distributional effects. For 
instance, the condition that pr(Y > y | A =a,U =u) is non-decreasing in wu for all a is the 
same as requiring a non-negative sign on the arrow U — Y, according to the theory of signed 
directed acyclic graphs (VanderWeele & Robins, 2010). The following theorem states the results 
analogous to Theorems 4-6. 


COROLLARY S5. In the causal diagram of Figure 2, if for all y and for both a = 1 and 0, 


(a) pr{Y(a) > y| A=1} 2 pr{Y(a) > y| A=0}; 
(b) cov{II, pr(Y > y| A=a,T)} < 0; 
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then 
DCE (y)\  (DCE™y)\_— (DCEP*(y) 
DCE*(y) | => | DCE™4(y) | > { DCEF"(y) | . (S17) 
DCE“4i(y) DCE™‘i(y) DCE™*(y) 


Under the conditions of Theorems 5 and 6, (S17) holds. 


Proof of Corollary $5. Condition (a) of Corollary S5 is equivalent to pr{A = 1 | Iy(a) = 
1} > pr{A = 1| L,(a) = 0}, and Condition (b) of Corollary S5 is equivalent to cov{H, E(Ly 
A =a,ID)} < 0. Therefore, the conclusion follows from Theorem 4. 

According to the proofs of Theorems 5 and 6, we have 


pr{A = 1] I,(a) =1} =pr{A=1] ¥(a) > y} = pr{A=1]Y(@) =y} 

> pr{A=1] Y(a) <y} =pr{A=1| Jy(a) = Of, 
because of monotonicity of pr{ A = 1 | Y(a)} in Y (a). Therefore, Condition (a) of Theorem S5 
holds. Under the conditions of Theorems 5 and 6, we have also shown in (S10)—(S16) that for 
all a,y and 7, Opr(Y < y| A=a,Il = 7)/On > 0, which implies that E(J, | A = a, Il = 7) 
is non-increasing in 7. Therefore, Condition (b) of Theorem S5 holds. The proof is complete. 


Appendix 3:2. Ratio Measures 


In many applications with binary or positive outcomes, we are also interested in assessing 
causal effects on the ratio scale for the treatment, control and whole populations, defined as 


prime — E{¥Q)|A= 1p tue _ L{Y(1) | A=} nue _ BLY (1)t 
© E{Y(0) | A= 1}? ® -E{Y (0) | A = 0}? EXY(0)} 
The unadjusted estimator on the ratio scale is 
:» EY | A=1) 
RRunadi = ; 
E(Y |A=0) 


The adjusted estimators on the ratio scale for the treatment, control and whole populations are 


E(Y | A=1) 


adj 
ae ERY | ASO 2 Seth (de A= 1)! 
ree _ J EY A=1,Z=2z}F(dz|A=0) 
aon E(Y | A=0) 
peal fHEY |AS=17= 2} F (dz) 
f E{Y | A=0,Z = z} F(dz) 


With a general instrumental variable Z, we can replace Z by II in the definitions of the adjusted 
estimators. 


COROLLARY S6. All the theorems and corollaries in §83 and 4 hold on the ratio scale, i.e., 
under their conditions, 


RR{" RRU4 RRE 
RR* > RRwa4j = RRiwte 
RRad RRumadj RRitue 
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Proof of Corollary S6. First, RR is a convex combination of RR*"° and RR@®, and RR*4 


is a convex combination of RR“ and RR*, which are formally stated in Ding & VanderWeele 
(2016, eAppendix). Then the conclusion follows from the proofs of the theorems above. 


Appendix 3:3. Average Over Observed Covariates 


In practice, we need to adjust for the observed covariates X that are confounders affecting 
both the treatment and outcome. The discussion in previous sections is conditional on or within 
strata of observed covariates X, and the causal effects and their estimators are given X. For 
example, 


ACE™*(z) = E{Y(1) | X =2}— B{Y(0) |X =a}, 
ACE™ 4 (g) = E(Y | A=1,X =«)— E(Y | A=0,X =2), 


ACES(c) = f EW | A 1,Z=2,X =2)F(dz|X =2) 


- few CAS 2X ]=2)Flde'| X=), 


and other conditional quantities can be analogously defined. If the conditions in the theorems and 
corollaries in §§3 and 4 hold within each level of X, then the conclusions in (1) and (S17) hold 
not only within each level of X but also averaged over X. For example, for the average causal 
effects, we have 


J ACE} (2) F(de | A= 1) f ACE™4(x)F(dz | A= 1) 
f ACE* (2) F(de | A= 0) }] = | [ACE (x) F (da | A = 0) 
f ACE*4 (x) F(d 7 f ACE"™4 (2) F(d o 


=1) 
0) 


J ACE? (x) F(d 
J ACES" (x) F(d 
Jf ACE™® (a ) 


IV 


a|A 
a|A 
F(dz) 


